- Special Edition Using Visual Basic Script -

CHAPTER 24 - Building Internet Rating Applications

by William R. Beem


As mainstream consumers converge upon the Internet, they bring with them a collection of baggage accumulated over the past few decades. The Internet was once the playground of academics and government agencies. For the most part, your average person off the street would have considered the content listless and dull. The content today is quite a bit more diverse. As consumers move into the Internet, they bring along the things that interest them most. While Madison Avenue types try to sell us everything from Pepsi to the latest movie, other parts of the Internet aren't quite as wholesome. One of the most pervasive items in the consumer world is sex. For good or bad, it's on the Internet and the Web.

How did the Internet move from boring technical diatribes to sex-for-sale? It's not that hard to imagine. Back in the 1980s, parents wanted to provide their college-bound youth with computers. The universities complied with the idea by providing network connections to each dorm room, and sometimes requiring a personal computer for admission. The next part is a simple matter of human nature. What happens when you provide a bunch of college educated people with raging youthful hormones access to an enormous network? You get creative ways to distribute sex-related material around the world in the blink of an eye.

As you can see, the influence of sex permeated the Internet before the average consumer jumped on board in the 1990s. In fact, sex-related topics probably helped drive Internet acceptance, just as sexy movies spurred VCR sales in the 1980s. While some of the prudish among us may wish to eradicate sexual content completely, that simply isn't going to happen. There's too much interest among some users and that means someone will always be around to provide material for them to use.

There are still problems remaining, though. Chief among them are the following:

Content Issues

The chief concern right now is how easily it is for anyone to access sexually related Internet sites, whether intentionally or not. In other areas of life, we have safeguards to prevent children from accessing adult entertainment clubs, movies, and magazines. With a broader range of people using the Internet, it seems appropriate that we devise similar safeguards for online access to similar adult material.

Although many people never consider corporate users wasting time at work, the problem is very real. Some corporate studies estimate that employees waste as much as 40 percent of their time on the Web viewing sites unrelated to their work. Interestingly, not all of them are looking for sexual content. One major aerospace firm has a tremendous problem with employees spending more time looking at current sports scores than reviewing the less exciting technical publications on the Web.

It's Not Censorship

Not everyone is trying to prevent someone else from accessing inappropriate material. Some people just want to protect themselves from unwanted images. It's quite easy to run across sexual content while surfing from one link to another, or by performing searches on seemingly harmless topics. For example, not everyone who enjoys photography wants to follow links to some of the adult subjects often associated with the profession. A person interested in building models may find unanticipated results in the results from a search query. While investigating the topic for this chapter, I discovered far more adult links than links to Internet Ratings material.

Each of these scenarios provides incentive for some measure of content selection. Although the government recently passed an intrusive Communications Decency Act (CDA), the reality is that many of these sites are still online and do nothing to prevent access by those who shouldn't be there. Fortunately, the Internet is a community unto itself, and it responds when a need arises.

A free market economy tends to rise up and meet demand. In this case, we have several new software packages on the market that enable parents and supervisors to control which Internet sites their children and users may access. Some of these packages include SurfWatch and CyberPatrol. Online services, such as CompuServe and America Online, are also providing parental controls features with their user software.

The Platform for Internet Content Selection (PICS) Standard

In August of 1995 representatives from 23 organizations gathered with MIT's World Wide Web Consortium to discuss content labeling for use with content selection software. The attendees had the following two major considerations in mind:

The result of the conference was PICS, the Platform for Internet Content Selection. PICS design allows a supervisor to block access to certain Internet sites. The key difference between PICS and the CDA is censorship. While the government bill seeks to eradicate anything it considers inappropriate, the PICS standard allows the user to merely avoid that which it doesn't want to see without imposing the same restrictions on other Internet users. The PICS standard facilitates the following:

It Takes Two

PICS requires action on both the client and server side of the connection. Content providers apply labels in their HTML code that describe the rating level of the material on their site. The users require software that detects these labels and acts upon them based upon permission settings that a parent or supervisor defines for the user. If both parties adhere to the same rating system, a user does not need to actually see the material in order to know that he or she wants to block it from view.

How do we define that which is or isn't appropriate? There are three factors at work here, as follow:

These three factors present a need for a flexible method of blocking content. They also present a problem, since different people disagree upon which content is or isn't harmful. Consider television programming as an example. In accordance with FCC regulations, broadcast television stations restrict nudity and sexual content in programs. Nevertheless, television is rife with blood and guts violence. Suggestively sexual programming comes on late in the evening, presumably when parents are home to supervise their children's watching habits, yet Saturday afternoons are often filled with war movies. There's a growing movement objecting to television violence.

The PICS standard tackles this problem by defining a mechanism for implementing rating structures, and then allowing others to implement the ratings system. Prior to PICS, a cottage industry came forth to fill the need. Instead of determining the ratings structure themselves, supervisors subscribed to services that catered to their needs. Different entities defined these subscription-based rating services, such as selection software business entities CyberPatrol and SurfWatch.

The Old Way

The selection software positions itself between the user and the online services, and uses a predetermined list of URLs known as unacceptable, or at least questionable, sites (see fig. 24.1). The software also uses a list of labels and keywords to determine if an unknown site is acceptable given the supervisor's criteria. Unfortunately, none of these products could process labels provided by competitors. This left content providers with a dilemma: how to provide adequate warning for selection software without spending an inordinate amount of time complying with an ever increasing number of conflicting products. There was a clear and present need for a content selection standard.

FIG. 24.1

Content selection software before PICS could block some pre-defined sites but not others.

The PICS Way

PICS separates the software from the labels. Any PICS-compatible selection software product can read PICS compliant labels implemented on a server. This means the content providers can implement one set of labels, and users can choose the selection software they desire (see fig. 24.2).

FIG. 24.2

PICS-compliant selection software uses labels provided by publishers and ratings services, as well as the supervisors criteria, to provide more accurate content blocking.

The PICS standard does not provide selection software or a specific rating system. It merely establishes the conventions for describing rating systems and label formats. The open nature of the PICS standard allows different rating systems for use by the same PICS-compliant software product. These ratings aren't restricted to adult-oriented material. As previously mentioned, the aerospace employer whose employees are addicted to sports scores may seek to block ESPN's Web site. A corporate intranet may use labels as a measure of security to access confidential materials published on the internal network.

The key element that allows selection software to read any set of labels is a new MIME type (application/pics-service) that specifies a standard format for describing a labeling service. The selection software reads service descriptions in this format to determine content labels. Supervisors use this information to configure their selection software. Listing 24.1 demonstrates a sample rating service based upon the MPAA movie rating scale.

Listing 24.1 Setting Up a Rating Service

((PICS-version 1.0)
(rating-system "http://moviescale.org/Ratings/Description/")
(rating-service "http://moveiscale.org/v1.0")
(icon "icons/moviescale.gif")
(name "The Movies Rating Service")
(description "A rating service based upon the MPAA's movie rating scale")
(category
(transmit-as "r")
(name "Rating")
(label (name "G") (value 0) (icon "icons/G.gif")
(label (name "PG") (value 1) (icon "icons/PG.gif")
(label (name "PG-13") (value 2) (icon "icons/PG-13.gif")
(label (name "R") (value 3) (icon "icons/R.gif")
(label (name "NC-17") (value 4) (icon "icons/NC-17.gif"))))

The initial section points to a URL that describes the labeling system and criteria for assigning ratings, an icon, a name, and a longer description of the service. Referring to resources by their URLs makes it possible to match a label with its associated resource, even if distributed separately. This separation is important, since it allows a ratings service to define labels for sites that may not wish to cooperate with the service. Anything named by a URL can have a label attached to it, including resources accessed by FTP, Gopher, HTTP, or Usenet. The PICS standard also defined a URL naming system for IRC so that supervisors can also restrict access to unacceptable chat rooms.

The latter section of Listing 24.1 describes the categories and its dimensions. In this case, there is only one category. Another example may have categories based upon Sex, Violence, and Language, each with its own series of levels. A supervisor creating a definition for a user, or group of users, must choose the level at which the selection software blocks content. If a user has a PG level clearance, the software denies access to a PG-13 (value 2) and higher content page.

The following are two optional security features that labels can include:

The message integrity check, in the form of an MD5 message digest, enables the software to detect whether or not something changed the resource after creating the label. A digital signature on the contents of the label itself allows the selection software to guarantee that a label was really created by the service it references. Think of this as a seal of approval. If you subscribe to a service that provides ratings, it can create a unique, unforgeable digital signature to associate with its labels. If a label claims to represent this service provider, a quick check of its signature verifies whether it's authentic or not.

Rate that URL

As previously mentioned, PICS provides an open platform for multiple, independent rating services. A rating service is an individual, group, or organization that provides content labels for information provided on the Internet. The new MIME type, application/pics-service, is the base type for defining a label. Each label has a rating system that defines it.

A rating system specifies the dimensions used for labeling, the scale of allowable values on each dimension, and a description of the criteria used to assign values. The Movie Scale used in Listing 24.1 is a single dimension rating system with five allowable values, G through NC-17. Other rating systems may have multiple dimensions.

Each rating system is identified by a valid URL. By doing so, several services can use the same rating system and refer to it by its unique identifier (the URL). Using a URL to name a rating system allows users to access it and obtain a human readable description of the rating system. Although the format for this description is unspecified, it should provide a supervisor with enough information about the rating system to allow for an accurate decision on whether the system fits the supervisor's needs or not.

A content label, or rating, has the following three parts that contain information about a document:

Describing the application/pics-service Type

Listing 24.1 shows the general format of a MIME application/pics-label type. Unfortunately, it's not a complete example. As mentioned previously, a rating system can have more than one category. Likewise, these categories can have different options than the ones displayed in the sample. This section displays a few more examples in HTML code Web page designers who wish to provide their own PICS ratings transmissions implement these HTML tags in their Web page design. Clients using PICS-compliant software will translate these tags and allow or deny access to the site based upon how the client's rules and preferences interpret the information presented by these tags.

(category

(transmit-as "Value")

(name "Value Index")

(min 0.0)

(max 1.0))

This category tag still defines a transmission name and a longer name value. Notice there are some differences, though. Notably, this category does not provide any labels. Instead, it defines this category with a rating range of 0.0 to 1.0. A rating service can choose a value in between for the content, and supervisors can also choose a value inside the range to block content.

(category

(transmit-as "Subject")

(name "Document subject")

(multivalue true)

(unordered true)

(label (name "Sex") (value 0))

(label (name "Violence") (value 1)

(label (name "Language") (value 2))

(label-only))

The preceding category sample demonstrates a rating system for a document that contains multiple subjects. A supervisor may choose to deny access based upon any one of these labels, or perhaps a combination of labels.

(category

(transmit-as "Color")

(name "Picture color")

(integer)

(category

(transmit-as "Hue")

(label (name "Blue") (value 0))

(label (name "Red") (value 1))

(label (name "Green") (value 2)))

(category

(transmist-as "Intensity")

(min 0)

(max 255))))

This sample demonstrates that a category can also have sub-categories that make up its properties. Note the flexibility that this combination allows. The Hue category provides three attributes, multiplied by an Intensity category range of 256 values.

Some Semantic Issues

There are always rules for any kind of code, and the syntax of application/pics-service attributes are no exception. This section describes some of the syntax rules you must observe when creating your application/pics-service attributes.

Labels and Label Lists

The PICS specification uses a labellist to transmit a set of PICS labels. The format is the same application/pics-labels MIME type discussed previously. The format transmits labels, as well as reasons why a label may be unavailable, along with a document. The labellist is always surrounded by parentheses and begins with the PICS version number. The following fragment displays a sample label list:

labellist :: '(' version service-info+ ')'

Label lists either specifies that there are no labels available or it is separated into sections of labels for each rating service. The URL of each service is specified as the serviceID. Following the URL is either an error message indicating why no labels are available from that service, or an overall set of optional information followed by the keyword labels and the labels from the service. The optional information applies to every label from the service, unless an a specific label has its own option to override the general service option.

Embedding Labels in HTML

You can imbed PICS labels in HTML files as meta-information using the HTML META element. Use the HTTP header equivalence, as in the following example:

<META http-equiv="PICS-Label" content='labellist'>

PICS includes an extension to HTML that allows a client to request one or more labels be included in a header along with a document. HTTP servers should only include PICS label headers if requested by the client, and then only include labels from services requested by the client. Including unrequested labels adds unnecessary burden on the bandwidth and serves no purpose, since the client ignores those labels.

An example PICS client request appears as follows:

GET some.html HTTP/1.0

Protocol-Reqest: (PICS-1.0 {params full

{services "http://www.someplace.org/v1.0"}}}

Here's what has happened so far. First, the client sends a request for some.htmlùa document on a Web server. The client's request asks for the full label of the document from the rating service at http://www.someplace.org. A client interested in examining a rating before retrieving the full document can substitute the word HEAD instead of GET in the request. The server responds with the header shown in the following listing, but not the document itself. This provides the user an opportunity to check ahead to see if the document has a suitable rating before spending the time to retrieve the document.

The server's response to the client appears as follows:

HTTP/1.0 200 OK

Date: Tuesday, 07-May-96 13:35:34 GMT

MIME-version: 1.0

Last-modified: Thursday, 07-May-96 05:12:47 GMT

Protocol: {PICS-1.0 {headers PICS-Label}}

PICS-Label:

(PICS-1.0 "http://www.someplace.org/v1.0" labels

on "1994.11.05T08:15-0500"

exp "1995.12.31T23:50-0000"

for "http:/www.website.com/some.html"

by "William Beem"

ratings (value 0.7 intensity 0 color/hue 1))

Content-type: text/html

...contents of some.html...

The server responds by sending back the label in a PICS-Label header, and also the requested document. The format of the PICS-Label headers field (labellist) allows the server to reply with either a label or an explanation of why the label isn't available. It's inappropriate for a server to generate an HTTP error status if the document is available, but the labels are unavailable.

Selection software can also request PICS labels separately from the documents to which they refer. In order to do this, a client contacts a label bureau. Label bureaus are HTTP servers that understand a particular type of query syntax. It provides labels for documents on other servers and for documents available through other protocols than HTTP. These label bureaus are most likely run by rating services, and may charge a fee for label queries. In fact, the PICS standards documentation encourages rating services to act as label bureaus.

Suppose a ratings service has a URL on the Web called http://www.ratem.org/Ratings. It decides to run a label bureau to dispense labels for its own documents. The following sample requests the URL to send a single label that applies to everything in the /images directory of another URL titled http://www.nasty.net.

GET /Ratings?opt=generic&

u="http%3A%2F%2www.nasty.net%2Fimages"&

s="http%3A%2F%2Fwww.ratem.org"&

HTTP/1.0

Upon receiving this query, the server sends back a MIME application/pics-label type document. The document should be complete, containing all options contained within the label. From this information, the client can decide whether to proceed into the /images directory or not.

Digital Signatures

As discussed in Chapter 8, "Security," digital signatures provide a mechanism to determine whether the contents of a data file or document were altered. The PICS standard makes use of this technology to determine whether a document changed since it was last labeled. This is quite a frequent occurrence since Web pages tend to change on a regular basis. It's possible that a change can happen due to some unauthorized access, but PICS labels have three option fields intended to identify and deter such events. These fields are as follows:

The next problem is to ensure that the labels received from a rating service are indeed coming from that service, and that they haven't been altered during transmission. PICS uses digital signatures to address both problems.

The rating service signs its labels with a public key pair. The service keeps its private key secret and distributes the public key to anyone who wishes to use the service. Upon creating the label, the service computes a message digest of the label using the MD5 algorithm and then encrypts it with the service's private key. The result of this process is the digital signature. The signature gets converted to US-ASCII using a base64 encoding technique and is then stored in the signature-rsa-md5 option of the label it transmits to the client.

After receiving the label, the client can verify the signature by converting the label back to binary form and re-computing the message digest. The client must also convert the contents of the signature-rsa-md5 option back to binary and decrypt it using the service's public key. Finally, a comparison of the new message digest and the decrypted message digest should provide an exact match if the document is authentic. While this sounds like a lot of work for the user, that's not really the case. It's a lot of work for the programmer, since that is who should automate all of these steps when a user receives a label.

PICS specifically requires the use of RSA signature algorithm with the MD5 message digest. This may change in the future. Such a change only requires a new label option that supports a different algorithm pair. PICS does not specify the key length necessary for encryption. This detail is left to the users and rating services to decide.

SafeSurfùA Ratings Example

SafeSurf is a company that provides content selection software and a rating service targeted at parents who want to protect their children from inappropriate content on the Internet. Parents using the SafeSurf Rating Standard can activate several layers of blocking based upon what the parents approve. Something parents find unsuitable for a small child may perhaps be less restricted for a teenager. SafeSurf maintains a master database of categories and updates new categories in as timely a manner possible. This section illustrates a rating scheme using SafeSurf's published categories and their associated levels.

Table 24.1 is a guide that relates to the numbered items presented in the lists that follow. Parents can choose any level they want, regardless of title.

Table 24.1 SafeSurf Ratings Scheme

Adult Themes with Caution Levels

Caution Level Description
0 Age Range
1 All Ages
2 Older Children
3 Teens
4 Older Teens
5 Adult Supervision Recommended
6 Adults
7 Limited to Adults
8 Adults Only
9 Explicitly for Adults

Profanity

Rating Number Rating Name Description
1 Subtle Innuendo Subtly implied through the use of slang
2 Explicit Innuendo Explicitly implied through the use of slang
3 Technical Reference Dictionary, encyclopedia, news, technical references
4 Non-Graphic-Artistic Limited non-sexual expletives used in an artistic fashion
5 Graphic-Artistic Non-sexual expletives used in an artistic fashion
6 Graphic Limited use of expletives and obscene gestures
7 Detailed Graphic Casual use of expletives and obscene gestures
8 Explicit Vulgarity Heavy use of vulgar language and obscene gestures. Unsupervised Chat Rooms
9 Explicit and Crude Saturated with crude sexual references and gestures. Unsupervised Chat Rooms

Heterosexual Themes

1 Subtle Innuendo Subtly implied through the use of a metaphor
2 Explicit Innuendo Explicitly implied (not described) through the use of a metaphor
3 Technical Reference Dictionary, encyclopedia, news, technical references
4 Non-Graphic-Artistic Limited metaphoric descriptions used in an artistic fashion
5 Graphic-Artistic Metaphoric descriptions used in an artistic fashion
6 Graphic Descriptions of intimate sexual acts
7 Detailed Graphic Descriptions of intimate details of sexual acts
8 Explicitly Graphic Explicit descriptions of intimate or Inviting details of sexual acts designed to Participation arouse. Inviting interactive sexual participation. Unsupervised Sexual Chat Rooms or Newsgroups.
9 Explicit and Crude Profane graphic descriptions or Explicitly of intimate details of sexual Inviting Participation acts designed to arouse. Inviting interactive sexual participation. Unsupervised Sexual Chat Rooms or Newsgroups.

Homosexual Themes

1 Subtle Innuendo Subtly implied through the use of a metaphor
2 Explicit Innuendo Explicitly implied (not described) through the use of a metaphor
3 Technical Reference Dictionary, encyclopedia, news, technical references
4 Non-Graphic-Artistic Limited metaphoric descriptions used in an artistic fashion
5 Graphic-Artistic Metaphoric descriptions used in an artistic fashion
6 Graphic Descriptions of intimate sexual acts
7 Detailed Graphic Descriptions of intimate details of sexual acts
8 Explicitly Graphic Explicit descriptions of intimate or Inviting details of sexual acts designed to Participation arouse. Inviting interactive sexual participation. Unsupervised Sexual Chat Rooms or Newsgroups.
9 Explicit and Crude Profane graphic descriptions or Explicitly of intimate details of sexual Inviting acts designed to arouse. Participation Inviting interactive sexual participation. Unsupervised Sexual Chat Rooms or Newsgroups.

Nudity

1 Subtle Innuendo
2 Explicit Innuendo
3 Technical Reference
4 Non-Graphic Artistic
5 Graphic Artistic
6 Graphic
7 Detailed Graphic
8 Explicit Vulgarity
9 Explicit and Crude

Violence

1 Subtle Innuendo
2 Explicit Innuendo
3 Technical Reference
4 Non-Graphic-Artistic
5 Graphic-Artistic
6 Graphic
7 Detailed Graphic
8 Inviting Participation in Graphic Interactive Format
9 Encouraging Personal Participation, Weapon Making

Sex, Violence, and Profanity

1 Subtle Innuendo
2 Explicit Innuendo
3 Technical Reference
4 Non-Graphic-Artistic
5 Graphic-Artistic
6 Graphic
7 Detailed Graphic
8 Explicit Vulgarity
9 Explicit and Crude

Intolerance

1 Subtle Innuendo
2 Explicit Innuendo
3 Technical Reference
4 Non-Graphic-Literary
5 Graphic-Literary
6 Graphic Discussions
7 Endorsing Hatred
8 Endorsing Violent or Hateful Action
9 Advocating Violent or Hateful Action

Glorifying Drug Use

1 Subtle Innuendo
2 Explicit Innuendo
3 Technical Reference
4 Non-Graphic-Artistic
5 Graphic-Artistic
6 Graphic
7 Detailed Graphic
8 Simulated Interactive Participation
9 Soliciting Personal Participation

Other Adult Themes

1 Subtle Innuendo
2 Explicit Innuendo
3 Technical Reference
4 Non-Graphic-Artistic
5 Graphic-Artistic
6 Graphic
7 Detailed Graphic
8 Explicit Vulgarity
9 Explicit and Crude

Gambling

1 Subtle Innuendo
2 Explicit Innuendo
3 Technical Discussion
4 Non-Graphic-Artistic, Advertising
5 Graphic-Artistic, Advertising
6 Simulated Gambling
7 Real Life Gambling without Stakes
8 Encouraging Interactive Real Life Participation with Stakes
9 Providing Means with Stakes

From Here...

Content controls provide a means for parents and employers to decide what is acceptable for their users to access. The key element is that supervisors have a choice in what they allow, rather than a government body dictating standards for everyone. Content controls also provide enough flexibility that different users can have customized views of the Internet, and they also allow a child to grow and access a larger variety of content as some restrictions become unnecessary.

This chapter provided an overview on a topic in its infancy. The PICS standard is a version 1.0 release, which suggests that it will undergo more changes in the coming months. To help you keep abreast of the most current information, we suggest you access the following Web sites for more information. Don't worry; they're all safe.


| Previous Chapter | Next Chapter |

| Search | Table of Contents | Book Home Page | Buy This Book |

| Que Home Page | Digital Bookshelf | Disclaimer |


To order books from QUE, call us at 800-716-0044 or 317-361-5400.

For comments or technical support for our books and software, select Talk to Us.

© 1996, QUE Corporation, an imprint of Macmillan Publishing USA, a Simon and Schuster Company.